13 research outputs found

    Global permutation tests for multivariate ordinal data: alternatives, test statistics, and the null dilemma

    Get PDF
    We discuss two-sample global permutation tests for sets of multivariate ordinal data in possibly high-dimensional setups, motivated by the analysis of data collected by means of the World Health Organisation's International Classification of Functioning, Disability and Health. The tests do not require any modelling of the multivariate dependence structure. Specifically, we consider testing for marginal inhomogeneity and direction-independent marginal order. Max-T test statistics are known to lead to good power against alternatives with few strong individual effects. We propose test statistics that can be seen as their counterparts for alternatives with many weak individual effects. Permutation tests are valid only if the two multivariate distributions are identical under the null hypothesis. By means of simulations, we examine the practical impact of violations of this exchangeability condition. Our simulations suggest that theoretically invalid permutation tests can still be 'practically valid'. In particular, they suggest that the degree of the permutation procedure's failure may be considered as a function of the difference in group-specific covariance matrices, the proportion between group sizes, the number of variables in the set, the test statistic used, and the number of levels per variable

    SHrinkage Covariance Estimation Incorporating Prior Biological Knowledge with Applications to High-Dimensional Data

    Get PDF
    In ``-omic data'' analysis, information on the structure of covariates are broadly available either from public databases describing gene regulation processes and functional groups such as the Kyoto encyclopedia of genes and genomes (KEGG), or from statistical analyses -- for example in form of partial correlation estimators. The analysis of transcriptomic data might benefit from the incorporation of such prior knowledge. In this paper we focus on the integration of structured information into statistical analyses in which at least one major step involves the estimation of a (high-dimensional) covariance matrix. More precisely, we revisit the recently proposed ``SHrinkage Incorporating Prior'' (SHIP) covariance estimation method which takes into account the group structure of the covariates, and suggest to integrate the SHIP covariance estimator into various multivariate methods such as linear discriminant analysis (LDA), global analysis of covariance (GlobalANCOVA), and regularized generalized canonical correlation analysis (RGCCA). We demonstrate the use of the resulting new methods based on simulations and discuss the benefit of the integration of prior information through the SHIP estimator. Reproducible R codes are available at http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/shipproject/index.html

    SHrinkage Covariance Estimation Incorporating Prior Biological Knowledge with Applications to High-Dimensional Data

    Get PDF
    In ``-omic data'' analysis, information on the structure of covariates are broadly available either from public databases describing gene regulation processes and functional groups such as the Kyoto encyclopedia of genes and genomes (KEGG), or from statistical analyses -- for example in form of partial correlation estimators. The analysis of transcriptomic data might benefit from the incorporation of such prior knowledge. In this paper we focus on the integration of structured information into statistical analyses in which at least one major step involves the estimation of a (high-dimensional) covariance matrix. More precisely, we revisit the recently proposed ``SHrinkage Incorporating Prior'' (SHIP) covariance estimation method which takes into account the group structure of the covariates, and suggest to integrate the SHIP covariance estimator into various multivariate methods such as linear discriminant analysis (LDA), global analysis of covariance (GlobalANCOVA), and regularized generalized canonical correlation analysis (RGCCA). We demonstrate the use of the resulting new methods based on simulations and discuss the benefit of the integration of prior information through the SHIP estimator. Reproducible R codes are available at http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/shipproject/index.html

    Global tests of association for multivariate ordinal data

    Get PDF
    Global tests are in demand whenever it is of interest to draw inferential conclusions about sets of variables as a whole. The present thesis attempts to develop such tests for the case of multivariate ordinal data in possibly high-dimensional set-ups, and has primarily been motivated by research questions that arise from data collected by means of the 'International Classification of Functioning, Disability and Health'. The thesis essentially comprises two parts. In the first part two tests are discussed, each of which addresses one specific problem in the classical two-group scenario. Since both are permutation tests, their validity relies on the condition that, under the null hypothesis, the joint distribution of the variables in the set to be tested is the same in both groups. Extensive simulation studies on the basis of the tests proposed suggest, however, that violations of this condition, from the purely practical viewpoint, do not automatically lead to invalid tests. Rather, two-sample permutation tests' failure appears to depend on numerous parameters, such as the proportion between group sizes, the number of variables in the set of interest and, importantly, the test statistic used. In the second part two further tests are developed which both can be used to test for association, if desired after adjustment for certain covariates, between a set of ordinally scaled covariates and an outcome variable within the range of generalized linear models. The first test rests upon explicit assumptions on the distances between the covariates' categories, and is shown to be a proper generalization of the traditional Cochran-Armitage test to higher dimensions, covariate-adjusted scenarios and generalized linear model-specific outcomes. The second test in turn parametrizes these distances and thus keeps them flexible. Based on the tests' power properties, practical recommendations are provided on when to favour one or the other, and connections with the permutation tests from the first part of the thesis are pointed out. For illustration of the methods developed, data from two studies based on the 'International Classification of Functioning, Disability and Health' are analyzed. The results promise vast potential of the proposed tests in this data context and beyond

    Over-optimism in bioinformatics: an illustration

    Get PDF
    In statistical bioinformatics research, different optimization mechanisms potentially lead to "over-optimism" in published papers. The present empirical study illustrates these mechanisms through a concrete example from an active research field. The investigated sources of over-optimism include the optimization of the data sets, of the settings, of the competing methods and, most importantly, of the method’s characteristics. We consider a "promising" new classification algorithm that turns out to yield disappointing results in terms of error rate, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. We quantitatively demonstrate that this disappointing method can artificially seem superior to existing approaches if we "fish for significance”. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should be validated using "fresh" validation data sets

    A Cochran-Armitage-type and a score-free global test for multivariate ordinal data

    Get PDF
    We propose a Cochran-Armitage-type and a score-free global test that can be used to assess the presence of an association between a set of ordinally scaled covariates and an outcome variable within the range of generalized linear models. Both tests are developed within the framework of the well-established 'global test' methodology and as such are feasible in high-dimensional data situations under any correlation and enable adjustment for covariates. The Cochran-Armitage-type test, for which an intimate connection with the traditional score-based Cochran-Armitage test is shown, rests upon explicit assumptions on the distances between the covariates' ordered categories. In contrast, the score-free test parametrizes these distances and thus keeps them flexible, rendering it ideally suited for covariates measured on an ordinal scale. As confirmed by means of simulations, the Cochran-Armitage-type test focuses its power on set-outcome relationships where the distances between the covariates' categories are equal or close to those assumed, whereas the score-free test spreads its power over the full range of possible set-outcome relationships, putting more emphasis on monotonic than on non-monotonic ones. Based on the tests' power properties, it is discussed when to favour one or the other, and the practical merits of both of them are illustrated by an application in the field of rehabilitation medicine. Our proposed tests are implemented in the R package globaltest

    A Cochran-Armitage-type and a score-free global test for multivariate ordinal data

    Get PDF
    We propose a Cochran-Armitage-type and a score-free global test that can be used to assess the presence of an association between a set of ordinally scaled covariates and an outcome variable within the range of generalized linear models. Both tests are developed within the framework of the well-established 'global test' methodology and as such are feasible in high-dimensional data situations under any correlation and enable adjustment for covariates. The Cochran-Armitage-type test, for which an intimate connection with the traditional score-based Cochran-Armitage test is shown, rests upon explicit assumptions on the distances between the covariates' ordered categories. In contrast, the score-free test parametrizes these distances and thus keeps them flexible, rendering it ideally suited for covariates measured on an ordinal scale. As confirmed by means of simulations, the Cochran-Armitage-type test focuses its power on set-outcome relationships where the distances between the covariates' categories are equal or close to those assumed, whereas the score-free test spreads its power over the full range of possible set-outcome relationships, putting more emphasis on monotonic than on non-monotonic ones. Based on the tests' power properties, it is discussed when to favour one or the other, and the practical merits of both of them are illustrated by an application in the field of rehabilitation medicine. Our proposed tests are implemented in the R package globaltest

    Global tests of association for multivariate ordinal data

    No full text
    Global tests are in demand whenever it is of interest to draw inferential conclusions about sets of variables as a whole. The present thesis attempts to develop such tests for the case of multivariate ordinal data in possibly high-dimensional set-ups, and has primarily been motivated by research questions that arise from data collected by means of the 'International Classification of Functioning, Disability and Health'. The thesis essentially comprises two parts. In the first part two tests are discussed, each of which addresses one specific problem in the classical two-group scenario. Since both are permutation tests, their validity relies on the condition that, under the null hypothesis, the joint distribution of the variables in the set to be tested is the same in both groups. Extensive simulation studies on the basis of the tests proposed suggest, however, that violations of this condition, from the purely practical viewpoint, do not automatically lead to invalid tests. Rather, two-sample permutation tests' failure appears to depend on numerous parameters, such as the proportion between group sizes, the number of variables in the set of interest and, importantly, the test statistic used. In the second part two further tests are developed which both can be used to test for association, if desired after adjustment for certain covariates, between a set of ordinally scaled covariates and an outcome variable within the range of generalized linear models. The first test rests upon explicit assumptions on the distances between the covariates' categories, and is shown to be a proper generalization of the traditional Cochran-Armitage test to higher dimensions, covariate-adjusted scenarios and generalized linear model-specific outcomes. The second test in turn parametrizes these distances and thus keeps them flexible. Based on the tests' power properties, practical recommendations are provided on when to favour one or the other, and connections with the permutation tests from the first part of the thesis are pointed out. For illustration of the methods developed, data from two studies based on the 'International Classification of Functioning, Disability and Health' are analyzed. The results promise vast potential of the proposed tests in this data context and beyond
    corecore